Methods, Requirements and Licenses for Shared NLG Resources

نویسندگان

David Reitter

Charles Callaway

چکیده

Tools and data that can be shared in the Natural Language Generation community require common standards for data collection, documentation, implementation and licensing as well as a central place to find such resources. We argue for open, free, well-documented and simplystructured resources, and introduce a free and open online repository of NLG resources. 1 Shared tasks and shared resources Work in Natural Language Processing has come to depend on automated evaluation schemes which provide empirical measures of success. This has led to constructive competition between various groups, for instance in syntactic parsing, information extraction, and machine translation. It has also led to rapid improvements in localized problems, but not yet to large end-to-end systems. Such schemes need large quantities of common, annotated data to deduce statistical relationships between produced and desired output. They need common and reusable, thus controlled test components or tasks. Meanwhile, NLG groups have used handcrafted data to demonstrate and qualitatively evaluate their systems. The NLG community has yet to produce common, standardized datasets, although this has slowly been changing. For instance, participants at the 2005 European Workshop on NLG expressed their desire to establish shared data, consisting of structured databases with domain-specific content and “gold standard” human-written results. Shared resources could allow for a centralized and coordinated evaluation of systems performing a shared task. Similarly, new systems could be evaluated against older competing ones. In this context, the idea of reusable software tools becomes very attractive. Any new NLG module needs to interface with existing ones before it can be sensibly evaluated by humans. For instance, a new adaptive realizer for a dialogue system will need a backend to supply selected content and a user model. Unfortunately, many systems created are domain-specific demos. Here, the underlying, novel principles are meant to be resuable, but their implementations are not. In the following, we propose some requirements for sharable data which should make implementations subsequently more reusable. We announce a common directory of resources, which is already available to the community. On the downside, we will want to a avoid a situation where researchers produce code to beat the automatic score, rather than make progress on solving the important challenges. For instance, summarization shies away from systems that aggregate and paraphrase because they wouldn’t score well in the standard ROUGE metric. Automatic evaluation must be used with caution. 2 Other Fields and their Ethical and Organizational Considerations Fields differ greatly in their adoption of datasharing. Genetics researchers, for instance, adopted the idea early on, and the US-American National Institute of Health mandated it in 2003. Even without being pressured by funding bodies, we are under a moral obligation to share data, once it has been collected with the help of public funds. To address concerns about the validity of studies carried out with these data, the exact collection method should be documented, so that resulting constraints on the analysis are evident. Ethical considerations are important in any setting. Carrying out NLG data collection experiments will commonly pose little difficulties. However, we suggest that participants are asked to permit the free dissemination of the recorded data, even if a distribution is not immediately planned. Any anonymization, where necessary for privacy reasons, should be planned and agreed before-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Repository of Data and Evaluation Resources for Natural Language Generation

Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created. In this paper, we describe the onlin...

متن کامل

Pragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation

Three questions to ask of a proposal for a shared evaluation task are: whether to evaluate, what to evaluate and how to evaluate. For NLG, shared evaluation resources could be a very positive development. In this statement I address two issues related to the what and how of evaluation: establishing a “big picture” evaluation framework, and evaluating generation in context.

متن کامل

Automatic Evaluation of Referring Expression Generation Is Possible

Shared evaluation metrics and tasks are now well established in many fields of Natural Language Processing. However, the Natural Language Generation (NLG) community is still lacking common methods for assessing and comparing the quality of systems. A number of issues that complicate automatic evaluation of NLG systems have been discussed in the literature. 1 The most fundamental observation in ...

متن کامل

Discussion Panel on Evaluation in Generation Research

Evaluation is critical in offering feedback on progress_toboth developers andpotential consumers of NLG technology. However, evaluation has thus far not been as well-established in NLG as it has become in NLU. This panel will discuss evaluation methods and resources. It is aimed at building a better understanding of NLG evaluation methods, and hopefully arriving at steps to facilitate future ev...

متن کامل

Validating the web-based evaluation of NLG systems

The GIVE Challenge is a recent shared task in which NLG systems are evaluated over the Internet. In this paper, we validate this novel NLG evaluation methodology by comparing the Internet-based results with results we collected in a lab experiment. We find that the results delivered by both methods are consistent, but the Internetbased approach offers the statistical power necessary for more fi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Methods, Requirements and Licenses for Shared NLG Resources

نویسندگان

چکیده

منابع مشابه

A Repository of Data and Evaluation Resources for Natural Language Generation

Pragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation

Automatic Evaluation of Referring Expression Generation Is Possible

Discussion Panel on Evaluation in Generation Research

Validating the web-based evaluation of NLG systems

عنوان ژورنال:

اشتراک گذاری